Image compression using a color visual model

ABSTRACT

A system for coding images, and more particularly, to a system for compressing images to a reduced number of bits by employing a Discrete Cosine Transform (DCT) in combination with a visual model.

This application claims the benefit of 60/441,583 filed Jan. 21, 2003entitled Automatic Image Compression Using A Color Visual Model.

BACKGROUND OF THE INVENTION

The present invention relates to a system for coding images, and moreparticularly, to a system for compressing images to a reduced number ofbits by employing a Discrete Cosine Transform (DCT) in combination witha visual model.

There has been significant development in the compression of digitalinformation for digital images. The effective compression of digitalinformation is important to maintain sufficient quality of the digitalimage while at the same time reducing the amount of data required forrepresenting the digital image. The transmission of the digital imageshas gained particular importance in television systems and Internetbased transmission. If the digital images include a relatively largenumber of bits to represent the digital images, a significant burden isplaced on the infrastructure of communication networks involved with thecreation, transmission, and re-creation of digital images. For thisreason, there is a need to compress digital images to a smaller numberof bits, by reducing redundancy and “invisible” image components of theimages themselves.

Still image compression techniques, such as JPEG, compress digitalinformation for digital images. As in digital compression for thetransmission of digital video, JPEG compression includes a tradeoffbetween file size and compressed image quality. For example, JPEGcompression is extensively used in digital cameras, Internet basedapplications, and databases containing digital images.

Many of the image compression techniques, such as JPEG and MPEG, includea transform coding algorithm for the digital image, wherein the image isdivided into blocks of pixels. For example, each block of pixels may bean 8×8 or 16×16 block of pixels. Each block of pixels then undergoes atwo dimensional transform to produce a two dimensional array oftransform coefficients. For many image coding applications, a DiscreteCosine Transform (DCT) is utilized to provide an orthogonal transform.After the block of pixels undergoes a Discrete Cosine Transform (DCT),the resulting transform coefficients are subject to compression bythresholding and quantization operations. Thresholding involves settingall coefficients whose magnitude is smaller than a threshold value equalto zero, whereas quantization involves scaling a coefficient by stepsize and rounding off to the nearest integer.

Commonly, the quantization of each DCT coefficient is determined by anentry in a quantization matrix (Q-table). A quantization matrix includesa plurality of values that is used to group a set of values together.For example, a quantization matrix may be used to group the values from0 to 3 into group 1, values from 3-6 into group 2, and values from 6-9into group 3. It is this matrix that is primarily responsible for theperceived image quality and the bit rate of the transmission of theimage. The perceived image quality is important because the human visualsystem can only tolerate a certain amount of image degradation withoutsignificantly observing a noticeable error. Therefore, certain imagescan tolerate significant degregration and thus be significantlycompressed, whereas other images cannot tolerate significant degradationand should not be significantly compressed.

Some systems include computing a single DCT quantization matrix based onhuman sensitivity. One such system is based on a mathematical formulafor the human contrast sensitivity function, scaled for viewing distanceand display resolution, as taught in U.S. Pat. No. 4,780,716. Anothersuch system is based on a formula for the visibility of individual DCTbasic functions, as a function of viewing distance, display resolution,and display luminance. The formula is disclosed in both a first articleentitled “Luminance-Model-Based DCT Quantization For Color ImageCompression” of A. J. Ahumada et al. published in 1992 in the HumanVision, Visual Processing, and Digital Display III Proc. SPIE 1666,Paper 32, and a second technical article entitled “An Improved DetectionModel for DCT Coefficient Quantization” of H. A. Peterson, et al.,published in 1993, in Human Vision, Visual Processing and DigitalDisplay VI Proc. SPIE. Vol. 1913 pages 191-201. The techniques describedin the '761 patent and the two technical articles do not adapt thequantization matrix to the image being compressed.

BRIEF DESCRIPTION OF THE DRAWINGS

FIG. 1 is a block diagram of a computer network that may be used in thepractice of the present invention.

FIG. 2 schematically illustrates a block diagram of an image encodingsystem.

FIG. 3 schematically illustrates the comparison of a pair of images.

DETAILED DESCRIPTION OF THE INVENTION

Referring to FIG. 1 a block diagram of a computer network 10 for thestoring, retrieving, and transmitting of images is illustrated. A pairof image processing devices 12 and 14 are provided. The image processingdevice 12 may be used to perform a storage mode 16 and a retrieval mode18 operation of the network 10 and, similarly, the image processingdevice 14 may be used to perform a storage mode 16 and a retrieval mode18 operation of the network 10. The storage mode 16 accesses a disksubsystem 20, whereas the retrieval mode 18 recovers information fromthe disk subsystem 20. Each of the devices 12 and 14 may be any type ofprocessing device, or otherwise a single processing device including thefunctionality of both devices 12 and 14. The devices 12 and 14 mayfurther include a RAM 26, a communication channel 22, a CPU processor24, and a display subsystem 28.

In general the system may include, in part, a compression technique thatincorporates a Discrete Cosine Transform (DCT). In the storage mode 16,an image 30 including a plurality of pixels, represented by a pluralityof digital bits, is received from any suitable sources through thecommunication channel 22 of the device 12. The device, and in particularthe CPU processor 24, performs a DCT transformation, computes a DCTmask, if desired, selects a quantization matrix, and estimates aquantization matrix optimizer. The device 12 then quantizes the digitalbits comprising the image 30, and performs encoding of the resultingquantized DCT coefficients, such as by example by run-length encoding,Huffman coding, or arithmetic coding. The resulting quantization matrixis then stored in coded form along with coded coefficient data using anysuitable technique, such as the JPEG standard. The compressed file isthen stored on the disk subsystem 20 of the device 12, or otherwisetransmitted to another device.

In the retrieval mode 18, the device 12 (or 14) retrieves the compressedfile from the disk subsystem 20, and decodes the quantization matrix andthe DCT coefficient data. The device 12 (or 14) then de-quantizes thecoefficients by multiplication of the resulting scaled quantizationmatrix and performs an inverse DCT. The resulting digital filecontaining pixel data is available for display on the display subsystem28 of the device 12 (or 14) or can be transmitted to the device 14 (or12) or elsewhere by the communication channel 22. The resulting digitalfile is illustrated in FIG. 1 as 30′ (IMAGE).

In some applications, such as digital image database applications, theimage may be compressed using a Q-table and then the resultingcompressed image is reconstructed and presented to the user. The userthen makes adjustments to the Q-table in some fashion and the process isrepeated until an acceptable compression of the image is achieved. Whilethis achieves an acceptable result, the process is time consuming,especially for large digital image databases. While it is the case thatthe appropriate selection of a Q-table (set of values) is desirable, itis problematic to automatically select such a table.

One existing technique for the selection of the Q-table is illustratedin U.S. Pat. No. 5,426,512, incorporated by reference herein. The errorresulting from quantization for a given scale factor of the Q-table isscaled in the DCT domain by using a perceptual mask, that suppressessome errors and leaves some other errors. The result after applying themask is then spatially pooled and compared against a target error. Ifsufficiently close to a target error, then the current Q-table is usedto compress the image. If not sufficiently close, the Q-table isadjusted. The model used is based upon a mean block luminance (for lightadaptation) and a DCT coefficient that depends on thresholds based oncoefficient amplitudes (for masking).

After consideration of using a visual model within the compressionprocess for Q-table optimization and comparison of DCT coefficients ofcompressed and uncompressed images, as disclosed in the '512 patent, thepresent inventors determined that the resulting model does notaccurately reflect the user's perception of the images. Moreover, usingthe visual model within the compression process for Q-table optimizationand comparison of DCT coefficients of compressed and uncompressedimages, as disclosed in the '512 patent, the present inventors furtherdetermined that the model does not take into account the displayparameters of the output device, such as the color primaries, themodulation transfer function, resolution (e.g., dpi), and tone scale. Toovercome this limitation the present inventors determined that a model,such as a visual model of the human visual system, should be used as thebasis of comparison between uncompressed and compressed images in thespatial domain.

Referring to FIG. 2, the system may include an input image 50 which isto be compressed using different Q-tables (or the same Q-tablemodified). The discrete cosine transform coefficients 52 are calculatedfrom the input image 50 (which may be in original form or modified byother techniques). Thresholding of the DCT coefficients may beperformed, if desired. A set of quantization tables (Q-table) 54, 56,58, and 60 are used to quantize the discrete cosine transformcoefficients. Larger values in the Q-table typically result in a smallercompressed file size, with larger compression artifacts. Similarly,smaller values in the Q-table typically result in a larger compressedfile size, with smaller compression artifacts. The present inventorscame to the realization that an “optimal” Q-table is not only dependenton the viewing condition, but is also dependent on the image itself. Inthe preferred embodiment, a set of four Q-tables may be used based uponthe human visual contrast sensitivity function (CSF) using differentviewing distances (such as 11, 14, 17, and 19 inches). The resolution ofthe intended display, the modulation transfer function of the display,the display luminance characteristics of the display, the display colorgamut of the display, the tone response curve of the display, may betaken into consideration when creating the Q-tables. For example, closerviewing distances will result in a flatter Q-table in the frequencydomain, while farther viewing distances will yield a steeper Q-table inwhich the higher order DCT coefficients are quantized more aggressively(with respect to the flatter Q-table).

The resulting set of Q-tables include characteristics that account forone or more of the following properties, such as for example, thecontrast sensitivity function of the human visual system, the viewingdistances, resolution of the intended display, the display luminancecharacteristics of the display, the display color gamut of the display,the tone response curve of the display, and the modulation transferfunction of the display. In this manner, the Q-table is different thanit would have been had one or more of these factors been omitted oradded.

The DCT coefficients, and hence the resulting image after encoding, arecompressed to substantially the same compression ratio. The compressionratio, may be for example, each (or a plurality of) resulting image iswithin 25% of the same size, within 10% of the same size, or within 5%of the same size. To achieve sufficient similarity in compression ratiothe Q-table may be scaled and the image recompressed. Accordingly, theeffect of each Q-table for compressing a particular image may be moreeffectively compared against the effect of other Q-tables if theresulting compressed image has a sufficiently similar compression ratio.

A model 62, 64, 66, and 68, such as a color visual difference model, maybe used to compare the differences between the original image 50 (orotherwise an image that has not been compressed) and an uncompressedversion of the respective image after quantization using the respectiveQ-table 54, 56, 58, 60. A color visual difference model simulates thevisual perception of the human eye. One such model is X. Feng, J.Speigel, and A. Morimoto, “Halftone image quality evaluation using colorvisual models”, Proc. Of PICS 2002, p 5-10, 2002, incorporated byreference herein. Such a model collapses to CIELAB for large patches ofcolor. The model may be calibrated so that the threshold occurs at deltaE 1.0, regardless of the frequency and background.

The model, based upon the viewing condition and display characteristics,may calculate the visibility of the differences as a function oflocation in the image. The result may be a set of values, or for JPEG asingle number, from the visual difference map. A variety of differentmetrics may be used, such as root mean square, median, 90th percentile,and 99th percentile. In the preferred embodiment, the 99th percentile isused and the threshold may be set to 1 delta E unit, which isapproximately the visual detection threshold. The threshold may beadjusted higher for applications where quality is not critical andstorage is at a premium. The threshold may also be adjusted lower forapplications that quality is critical, or the JPEG images may be viewedat a close distance.

Once the Q-table has been selected at block 70 based upon some criteria,the image 50 is compressed using a DCT, the selected Q-table, andencoding of the data, at block 72. The resulting image is thenreconstructed and compared against the image 50 using a model, such asthe color visual difference model at block 74. If the resulting errormetric E at block 76 is smaller than a low threshold (such as athreshold minus a tolerance value which may be within approximately 5%of the tolerance, if desired) then a scaling factor that scales thevalues in the Q-table is checked at block 78 to see if it is greaterthan a maximum value. The scaling factor scales the Q-table in somemanner and thus controls the amount of compression, which impacts theresulting image quality. If the scaling factor is not greater than amaximum value then the scaling factor is increased at block 80. Thus,block 84 results from the case when the compression artifacts are belowthe visual threshold based upon some viewing condition and/or display.Therefore, the image may be compressed further to reduce the compressedimage size by increasing the scale factor. The selected Q-table is thenre-scaled using the modified scaling factor and the image 50 is thenre-quantized using the modified Q-table. The quantized image is thenreconstructed and evaluated against the image 50 using a model, such asthe color visual difference model at block 74. The error metric iscomputed at block 76 and if the error is greater than a high threshold(such as a threshold plus a tolerance value) then the scaling factorthat scales the value in the Q-table is checked at block 82 to see if itis smaller than a minimum value. If the scaling factor is not less thanthe minimum value then the scaling factor is decreased at block 84.Thus, block 80 results from the case when the compression artifacts areabove the visual threshold based upon some viewing condition and/ordisplay. Therefore, the image may be compressed less to increase thecompressed image size by decreasing the scale factor. The selectedQ-table is then re-scaled using the modified scaling factor and theimage 50 is then re-quantized using the modified Q-table. The quantizedimage is then reconstructed and evaluated against the image 50 using thecolor visual difference model at block 74. The error metric is computedat block 76 and if the error is within tolerances a suitable Q-table andscaling factor (or otherwise modified Q-table) is selected. The imagemay be saved in a suitable file format, such as JPEG or otherwisetransmitted to a suitable destination at block 86.

In another embodiment, the Q-tables may be based upon other criteria.For example, the Q-tables may represent different power spectra in theimage to be compressed. This aspects relates to masking, which in turnrelates to supra-threshold perception (i.e., in supra-thresholdperception, the contrast is higher and more masking typically occurs).As the level of overall masking occurring in an image rises, thevariation in sensitivities of the spatial frequency channels decreases.This implies a flatter Q-table will be appropriate for that image. Inthe case of image with very special characteristics, such as anapplication that has many images of striated texture (microscopicmedical images), then the tables may reflect the oriented textures aswell, and additional tables may be desirable.

Referring to FIG. 3, a graphical illustration is provided on oneembodiment of a portion of the system. All illustrated an original image100 is encoded 102, such as by a JPEG encoder. The encoded image is thenreconstructed 104. The original image 100 and the reconstructed image104 are modeled, such as by a color visual difference model 106. Themodel 106 provides a visual difference map of the image 108 from whichan error metric 110 may be obtained.

1. An automated method for encoding an image, said method comprising:(a) inputting image data into a processing device; (b) said processingdevice quantizing a discrete cosine transform of said image using afirst set of quantization values; (c) said processing device quantizingsaid discrete cosine transform of said image using a second set ofquantization values different from said first set of quantizationvalues, and where neither said first set of quantization values nor saidsecond set of quantization values are calculated using data from saidimage; (d) said processing device comparing said image to a spatialreconstructed image based upon said first set of quantization valuesusing a visual difference model that simulates the perception of thehuman eye; (e) said processing device comparing said image to a spatialreconstructed image based upon said second set of quantization valuesusing said visual difference model; (f) said processing device selectingone of said first set of quantization values and said second set ofquantization values based upon respective said comparing; and (g) saidprocessing device encoding said image, on a computer-readable medium,with the selected set of quantization values so as to be viewable on adisplay device.
 2. The method of claim 1 including the step ofselectively scaling the selected one of said first set of quantizationvalues and said second set of quantization values if a comparison ofsaid image to said spatial reconstructed image produces an error metricbetween an upper threshold and a lower threshold.
 3. The method of claim1 wherein said first set of quantization values is based upon, at leastin part, the color primaries of a display.
 4. The method of claim 1wherein said first set of quantization values is based upon, at least inpart, the modulation transfer function of a display.
 5. The method ofclaim 1 wherein said first set of quantization values is based upon, atleast in part, a tone scale of a display.
 6. The method of claim 1wherein said first set of quantization values is based upon, at least inpart, the resolution of a display.
 7. The method of claim 1 wherein saidfirst set of quantization values is based upon, at least in part, aparticular viewing distance for viewing the display.
 8. The method ofclaim 1 wherein said comparing is based upon, at least in part, acontrast sensitivity function of the human visual system.
 9. The methodof claim 1 wherein said first set of quantization values is based upon,at least in part, a color gamut of a display.
 10. The method of claim 1wherein said comparing is based upon, at least in part, a contrastsensitivity difference model.
 11. The method of claim 10 wherein saidmodel collapses to CIELAB for large patches of color.
 12. The method ofclaim 1 wherein said spatial reconstructed image based upon said firstset of quantization values and said spatial reconstructed image basedupon said second set of quantization values are each reconstructed fromrespective digital structures having substantially the same compressionratio in relation to each other when respectively compared to saidimage.
 13. The method of claim 1 wherein said first set of quantizationvalues is based upon, at least in part, a luminance response of adisplay.
 14. The method of claim 1 wherein said selecting is based uponan error measure.
 15. The method of claim 1 further comprisingdetermining a first error measure based upon said comparing of saidfirst set and a second error measure based upon said comparing of saidsecond set.
 16. The method of claim 15 wherein said selecting is basedupon said first and second error measures.
 17. The method of claim 16further comprising modifying said selected set of quantization valuesbased upon said error measure.
 18. The method of claim 17 furthercomprising modifying said image based upon said modified selected set ofquantization values.
 19. The method of claim 18 wherein said modifiedimage is encoded.
 20. An automated method for encoding an image, saidmethod comprising: (a) a processing device receiving a first digitalimage; (b) said processing device quantizing a discrete cosine transformof said first image using a first set of quantization values; (c) saidprocessing device comparing said first image to a spatial reconstructedimage based upon said first set of quantization values using a model todetermine an error measure; (d) based upon said error measure, saidprocessing device scaling said first set of quantization values byapplying a single common scaling factor to each quantization valuewithin said first set of quantization values, said scaling factor havinga value not dependent on information from said first image; (e) saidprocessing device quantizing said discrete cosine transform of saidfirst image using said modified first set of quantization values andencoding said first image on a computer-readable medium so as to bevisually presentable on a display device.
 21. The method of claim 20wherein a scaling factor is selectively increased based upon said errormeasure.
 22. The method of claim 21 wherein said scaling factor isselectively decreased based upon said error measure.
 23. The method ofclaim 22 wherein said error measure is selectively decreased providedsaid error measure is greater than a threshold.
 24. The method of claim21 wherein said error measure is selectively increased provided saiderror measure is less than a threshold.
 25. An automated method forencoding an image, said method comprising: (a) receiving a first image;(b) a processing device quantizing a discrete cosine transform of saidfirst image using a first set of quantization values; (c) saidprocessing device quantizing said discrete cosine transform of saidfirst image using a second set of quantization values different fromsaid first set of quantization values, and where neither said first setof quantization values nor said second set of quantization values arecalculated using data from said image; (d) said processing devicecomparing said first image to a spatial reconstructed image based uponsaid first set of quantization values using a model to determine anerror measure; (e) said processing device comparing said first image toa spatial reconstructed image based upon said second set of quantizationvalues using said model to determine an error measure; (f) saidprocessing device selecting one of said first set of quantization valuesand said second set of quantization values based upon respective saiderror measures; (g) based upon said error measure, said processingdevice scaling the selected said one said set of quantization values;(h) said processing device quantizing said discrete cosine transform ofsaid first image using said modified set of quantization values; and (i)said processing device encoding said first image on a computer-readablemedium so as to be visually presentable on a display device.